Introduction

The video game FIFA, which is developed by Electronic Arts (EA) Sports, has become the most popular sports video game in the world in recent years, largely due to its game mode Ultimate Team. The objective of Ultimate Team is to build the best team possible through both buying and selling players, as well as buying packs of cards similarly to how people buy soccer trading cards in real life. Each player receives ratings in various categories based on their real life abilities, and each of these ratings factor into their overall rating. At the end of each season, EA Sports creates a Team of the Season (TOTS), where they select the best player at each position in each league from that season based on how they performed in real life. The players who receive TOTS cards also receive a boost to their overall rating to reflect their abilities in real life. Although most of their choices for TOTS are understandable, there are some choices that confuse and sometimes anger fans. Along with this, EA has never explained how they make their choices. Through the use of machine learning methods and predictive modeling, we aim to determine which variables are most important when choosing a player for TOTS, as well as predict the Team of the Season for Europe’s top five leagues based on this season’s statistics.



Methods and Materials

Materials: We retrieved complete player datasets for FIFA 17, FIFA 18, and FIFA 19 from here. We retrieved real life statistics from the 2016-2017, 2017-2018, and 2018-2019 seasons from fbref.com. We did not use data from the 2019-2020 season because COVID-19 caused each season to prematurely end in March of 2020.

Methods:



English Premier League

##           Truth
## Prediction Normal TOTS
##     Normal    116    7
##     TOTS        9   10
##           Truth
## Prediction Normal TOTS
##     Normal    116    7
##     TOTS        9   10
##                   Player revision position Int TklW OG PKcon Nation
## 1           Eric Dier 17   Normal      CDM  37   34  0     0    ENG
## 2        Adam Lallana 17     TOTS       CM  20   35  0     0    ENG
## 3          Sadio Mane 17     TOTS       RW  11   18  0     1    SEN
## 4        Victor Moses 17   Normal       RB  41   42  0     0    NGA
## 5          Paul Pogba 17   Normal       CM  37   40  0     1    FRA
## 6      Victor Wanyama 17   Normal      CDM  39   64  0     0    KEN
## 7   Philippe Coutinho 17   Normal       LW  18   25  0     0    BRA
## 8       Sergio Aguero 18     TOTS       ST   8    5  0     0    ARG
## 9           Eric Dier 18   Normal       CB  30   35  0     0    ENG
## 10 Abdoulaye Doucoure 18     TOTS      CDM  41   41  0     1    FRA
## 11   Andrew Robertson 18     TOTS       LB  24   21  0     0    SCO
## 12   Antonio Valencia 18   Normal       RB  43   37  0     0    ECU
## 13  Christian Eriksen 19     TOTS      CAM  11   27  0     0    DEN
## 14         Harry Kane 19   Normal       ST   4    7  0     0    ENG
## 15     James Maddison 19     TOTS      CAM  12   34  0     0    ENG
## 16      Callum Wilson 19   Normal       ST   1    9  0     0    ENG
##              Squad Age Born MP  Min minutes_played_divided_by90 Gls Ast
## 1        Tottenham  22 1994 36 3043                        33.8   2   1
## 2        Liverpool  28 1988 31 2348                        26.1   8   6
## 3        Liverpool  24 1992 27 2235                        24.8  13   5
## 4          Chelsea  25 1990 34 2483                        27.6   3   2
## 5   Manchester Utd  23 1993 30 2608                        29.0   5   4
## 6        Tottenham  25 1991 36 3012                        33.5   4   1
## 7        Liverpool  24 1992 31 2227                        24.7  13   8
## 8  Manchester City  29 1988 25 1963                        21.8  21   6
## 9        Tottenham  23 1994 34 2824                        31.4   0   2
## 10         Watford  24 1993 37 3324                        36.9   7   3
## 11       Liverpool  23 1994 22 1940                        21.6   1   5
## 12  Manchester Utd  31 1985 31 2740                        30.4   3   1
## 13       Tottenham  26 1992 35 2774                        30.8   8  12
## 14       Tottenham  25 1993 28 2424                        26.9  17   4
## 15  Leicester City  21 1996 36 2831                        31.5   7   7
## 16     Bournemouth  26 1992 30 2528                        28.1  14   9
##    Non_PK_G PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1         2  0     0    6    0    0.06    0.03           0.09             0.06
## 2         8  0     0    3    0    0.31    0.23           0.54             0.31
## 3        13  0     0    4    0    0.52    0.20           0.72             0.52
## 4         3  0     0    4    0    0.11    0.07           0.18             0.11
## 5         5  0     0    7    0    0.17    0.14           0.31             0.17
## 6         4  0     0   10    0    0.12    0.03           0.15             0.12
## 7        13  0     0    2    0    0.53    0.32           0.85             0.53
## 8        17  4     4    2    0    0.96    0.28           1.24             0.78
## 9         0  0     0    4    0    0.00    0.06           0.06             0.00
## 10        7  0     0   10    0    0.19    0.08           0.27             0.19
## 11        1  0     0    2    0    0.05    0.23           0.28             0.05
## 12        3  0     0    7    0    0.10    0.03           0.13             0.10
## 13        8  0     0    3    0    0.26    0.39           0.65             0.26
## 14       13  4     4    5    0    0.63    0.15           0.78             0.48
## 15        6  1     2    4    1    0.22    0.22           0.45             0.19
## 16       13  1     2    3    0    0.50    0.32           0.82             0.46
##    G_plus_A_minus_PK_per90 Rk  GF GA  GD Pts Attendance .pred_Normal .pred_TOTS
## 1                     0.09  2  86 26  60  86      31639 0.0002857143 0.99971429
## 2                     0.54  4  78 42  36  76      53016 0.9690373056 0.03096269
## 3                     0.72  4  78 42  36  76      53016 0.7454276848 0.25457232
## 4                     0.18  1  85 33  52  93      41508 0.0019047619 0.99809524
## 5                     0.31  6  54 29  25  69      75290 0.2535569210 0.74644308
## 6                     0.15  2  86 26  60  86      31639 0.0187142857 0.98128571
## 7                     0.85  4  78 42  36  76      53016 0.2255183441 0.77448166
## 8                     1.05  1 106 27  79 100      54070 0.7623181004 0.23768190
## 9                     0.06  3  74 36  38  77      67953 0.0850712432 0.91492876
## 10                    0.27 14  44 64 -20  41      20231 0.9763224276 0.02367757
## 11                    0.28  4  84 38  46  75      53049 0.9166878037 0.08331220
## 12                    0.13  2  68 28  40  81      74976 0.0745622120 0.92543779
## 13                    0.65  4  67 39  28  71      54216 0.8036522291 0.19634777
## 14                    0.63  4  67 39  28  71      54216 0.3609930723 0.63900693
## 15                    0.41  9  51 48   3  52      31851 0.9491904589 0.05080954
## 16                    0.78 14  56 70 -14  45      10532 0.2149315094 0.78506849
##    .pred_class
## 1         TOTS
## 2       Normal
## 3       Normal
## 4         TOTS
## 5         TOTS
## 6         TOTS
## 7         TOTS
## 8       Normal
## 9         TOTS
## 10      Normal
## 11      Normal
## 12        TOTS
## 13      Normal
## 14        TOTS
## 15      Normal
## 16        TOTS
##          Player revision position Int TklW OG PKcon Nation          Squad Age
## 1    Harry Kane   Normal       ST  13   10  0     0    ENG      Tottenham  27
## 2 Mohamed Salah   Normal       RW   6   13  0     0    EGY      Liverpool  28
## 3   Timo Werner   Normal       ST   6   15  0     0    GER        Chelsea  25
## 4 Ollie Watkins   Normal       ST   9   20  0     0    ENG    Aston Villa  25
## 5   Jamie Vardy   Normal       ST   7    8  0     0    ENG Leicester City  34
##   Born MP Starts  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt
## 1 1993 30     30 2632                        29.2  21  13       17  4     4
## 2 1992 32     29 2633                        29.3  20   3       14  6     6
## 3 1996 31     25 2243                        24.9   6   6        6  0     0
## 4 1995 32     32 2880                        32.0  12   4       11  1     2
## 5 1987 29     26 2401                        26.7  13   8        7  6     7
##   CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1    1    0    0.72    0.44           1.16             0.58
## 2    0    0    0.68    0.10           0.79             0.48
## 3    1    0    0.24    0.24           0.48             0.24
## 4    2    0    0.37    0.12           0.50             0.34
## 5    1    0    0.49    0.30           0.79             0.26
##   G_plus_A_minus_PK_per90 Matches Rk GF GA GD Pts Attendance .pred_Normal
## 1                    1.03 Matches  7 56 38 18  53        125    0.1230343
## 2                    0.58 Matches  6 55 39 16  54        353    0.2559387
## 3                    0.48 Matches  4 51 31 20  58        125    0.4411583
## 4                    0.47 Matches 11 46 37  9  45         NA    0.4659519
## 5                    0.56 Matches  3 60 38 22  62         NA    0.5636206
##   .pred_TOTS .pred_class
## 1  0.8769657        TOTS
## 2  0.7440613        TOTS
## 3  0.5588417        TOTS
## 4  0.5340481        TOTS
## 5  0.4363794      Normal
##            Player revision position Int TklW OG PKcon Nation           Squad
## 1           Rodri   Normal      CDM  31   54  0     0    ESP Manchester City
## 2 Bruno Fernandes   Normal      CAM  18   36  0     1    POR  Manchester Utd
## 3   Son Heung min   Normal       LM  20   12  0     0    KOR       Tottenham
## 4 Marcus Rashford   Normal       LM  10    7  0     0    ENG  Manchester Utd
## 5     Mason Mount   Normal      CAM  34   44  0     0    ENG         Chelsea
##   Age Born MP Starts  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt
## 1  24 1996 29     27 2353                        26.1   2   1        1  1     1
## 2  26 1994 33     32 2821                        31.3  16  11        8  8     9
## 3  28 1992 32     31 2665                        29.6  15   9       14  1     1
## 4  23 1997 33     31 2686                        29.8  10   8       10  0     0
## 5  22 1999 32     28 2545                        28.3   6   4        5  1     1
##   CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1    4    0    0.08    0.04           0.11             0.04
## 2    5    0    0.51    0.35           0.86             0.26
## 3    0    0    0.51    0.30           0.81             0.47
## 4    4    0    0.34    0.27           0.60             0.34
## 5    2    0    0.21    0.14           0.35             0.18
##   G_plus_A_minus_PK_per90 Matches Rk GF GA GD Pts Attendance .pred_Normal
## 1                    0.08 Matches  1 69 24 45  77         NA    0.1087317
## 2                    0.61 Matches  2 64 35 29  67         NA    0.1416100
## 3                    0.78 Matches  7 56 38 18  53        125    0.2159629
## 4                    0.60 Matches  2 64 35 29  67         NA    0.2218100
## 5                    0.32 Matches  4 51 31 20  58        125    0.3730710
##   .pred_TOTS .pred_class
## 1  0.8912683        TOTS
## 2  0.8583900        TOTS
## 3  0.7840371        TOTS
## 4  0.7781900        TOTS
## 5  0.6269290        TOTS
##              Player revision position Int TklW OG PKcon Nation           Squad
## 1     Harry Maguire   Normal       CB  60   18  0     0    ENG  Manchester Utd
## 2 Aaron Wan Bissaka   Normal       RB  63   48  0     0    ENG  Manchester Utd
## 3        Ruben Dias   Normal       CB  21   19  1     1    POR Manchester City
## 4         Luke Shaw   Normal       LB  23   39  1     0    ENG  Manchester Utd
## 5      Matt Targett   Normal       LB  37   46  0     1    ENG     Aston Villa
##   Age Born MP Starts  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt
## 1  28 1993 33     33 2970                        33.0   2   1        2  0     0
## 2  23 1997 31     31 2790                        31.0   2   2        2  0     0
## 3  23 1997 29     29 2573                        28.6   1   0        1  0     0
## 4  25 1995 29     27 2384                        26.5   1   5        1  0     0
## 5  25 1995 32     32 2864                        31.8   0   1        0  0     0
##   CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1   10    0    0.06    0.03           0.09             0.06
## 2    3    0    0.06    0.06           0.13             0.06
## 3    3    0    0.03    0.00           0.03             0.03
## 4    7    0    0.04    0.19           0.23             0.04
## 5    7    0    0.00    0.03           0.03             0.00
##   G_plus_A_minus_PK_per90 Matches Rk GF GA GD Pts Attendance .pred_Normal
## 1                    0.09 Matches  2 64 35 29  67         NA    0.1639413
## 2                    0.13 Matches  2 64 35 29  67         NA    0.1932982
## 3                    0.03 Matches  1 69 24 45  77         NA    0.2619920
## 4                    0.23 Matches  2 64 35 29  67         NA    0.3116461
## 5                    0.03 Matches 11 46 37  9  45         NA    0.3983746
##   .pred_TOTS .pred_class
## 1  0.8360587        TOTS
## 2  0.8067018        TOTS
## 3  0.7380080        TOTS
## 4  0.6883539        TOTS
## 5  0.6016254        TOTS



La Liga (Spain)

##           Truth
## Prediction Normal TOTS
##     Normal    114    6
##     TOTS        4    9
##           Truth
## Prediction Normal TOTS
##     Normal    114    6
##     TOTS        4    9
##                     Player revision position Int TklW OG PKcon Nation
## 1         Sergi Roberto 17   Normal       RB  49   44  0     0    ESP
## 2  Kevin Prince Boateng 17     TOTS       ST  19   16  0     2    GHA
## 3         Dani Carvajal 17     TOTS       RB  45   41  0     0    ESP
## 4         Karim Benzema 18   Normal       ST   6    6  0     0    FRA
## 5                  Koke 18   Normal       CM  23   41  0     0    ESP
## 6               Marcelo 18   Normal       LB  26   32  0     0    BRA
## 7               Roberto 18     TOTS       RB   0    0  0     0    ESP
## 8           Ever Banega 19     TOTS      CDM  31   44  0     1    ARG
## 9                 Djene 19     TOTS       CB  59   36  0     3    TOG
## 10        Mario Hermoso 19     TOTS       CB  25   25  0     2    ESP
##                 Squad Age Born MP  Min minutes_played_divided_by90 Gls Ast
## 1           Barcelona  24 1992 32 2385                        26.5   0   6
## 2          Las Palmas  29 1987 28 1978                        22.0  10   4
## 3         Real Madrid  24 1992 23 2018                        22.4   0   4
## 4         Real Madrid  29 1987 32 2149                        23.9   5   9
## 5  Atl\xe9tico Madrid  25 1992 35 2753                        30.6   4   3
## 6         Real Madrid  29 1988 28 2262                        25.1   2   6
## 7           M\xe1laga  31 1986 34 3060                        34.0   0   0
## 8             Sevilla  30 1988 32 2667                        29.6   3   5
## 9              Getafe  26 1991 34 2976                        33.1   0   0
## 10           Espanyol  23 1995 32 2806                        31.2   3   0
##    Non_PK_G PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1         0  0     0    5    0    0.00    0.23           0.23             0.00
## 2        10  0     0   11    3    0.46    0.18           0.64             0.46
## 3         0  0     0   11    0    0.00    0.18           0.18             0.00
## 4         3  2     2    0    0    0.21    0.38           0.59             0.13
## 5         4  0     0    3    0    0.13    0.10           0.23             0.13
## 6         2  0     0    3    1    0.08    0.24           0.32             0.08
## 7         0  0     0    0    0    0.00    0.00           0.00             0.00
## 8         1  2     2   17    2    0.10    0.17           0.27             0.03
## 9         0  0     0   13    2    0.00    0.00           0.00             0.00
## 10        3  0     0    7    0    0.10    0.00           0.10             0.10
##    G_plus_A_minus_PK_per90 Rk  GF GA  GD Pts Attendance .pred_Normal .pred_TOTS
## 1                     0.23  2 116 37  79  90      78034    0.4945020  0.5054980
## 2                     0.64 14  53 74 -21  39      20249    0.8094432  0.1905568
## 3                     0.18  1 106 41  65  93      69426    0.7078319  0.2921681
## 4                     0.50  3  94 44  50  76      66161    0.4543076  0.5456924
## 5                     0.23  2  58 22  36  79      55483    0.3289353  0.6710647
## 6                     0.32  3  94 44  50  76      66161    0.4382130  0.5617870
## 7                     0.00 20  24 61 -37  20      20420    0.8628984  0.1371016
## 8                     0.20  6  62 47  15  59      35993    0.7037185  0.2962815
## 9                     0.00  5  48 35  13  59      11000    0.7566533  0.2433467
## 10                    0.10  7  48 50  -2  53      19388    0.9327282  0.0672718
##    .pred_class
## 1         TOTS
## 2       Normal
## 3       Normal
## 4         TOTS
## 5         TOTS
## 6         TOTS
## 7       Normal
## 8       Normal
## 9       Normal
## 10      Normal
##              Player revision position Int TklW OG PKcon Nation
## 1      Lionel Messi   Normal       RW   6   12  0     0    ARG
## 2     Karim Benzema   Normal       ST  11    5  0     0    FRA
## 3       Luis Suarez   Normal       ST   5    4  0     0    URU
## 4 Antoine Griezmann   Normal       ST  11   23  0     0    FRA
## 5   Mikel Oyarzabal   Normal       LW  16   13  0     0    ESP
##                Squad Age Born MP  Min minutes_played_divided_by90 Gls Ast
## 1          Barcelona  33 1987 30 2573                        28.6  25   9
## 2        Real Madrid  33 1987 29 2458                        27.3  21   7
## 3 Atl\xe9tico Madrid  34 1987 27 2078                        23.1  19   2
## 4          Barcelona  30 1991 30 2095                        23.3  11   6
## 5      Real Sociedad  24 1997 28 2015                        22.4  10   7
##   Non_PK_G PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1       22  3     4    4    0    0.87    0.31           1.19             0.77
## 2       20  1     1    2    0    0.77    0.26           1.03             0.73
## 3       16  3     3    5    0    0.82    0.09           0.91             0.69
## 4       10  1     2    4    0    0.47    0.26           0.73             0.43
## 5        4  6     7    1    0    0.45    0.31           0.76             0.18
##   G_plus_A_minus_PK_per90 Matches Rk GF GA GD Pts Attendance .pred_Normal
## 1                    1.08 Matches  3 76 29 47  71         NA    0.1662030
## 2                    0.99 Matches  2 56 24 32  71         NA    0.3327133
## 3                    0.78 Matches  1 60 22 38  73         NA    0.4027990
## 4                    0.69 Matches  3 76 29 47  71         NA    0.4755379
## 5                    0.49 Matches  5 51 34 17  53         NA    0.5653534
##   .pred_TOTS .pred_class
## 1  0.8337970        TOTS
## 2  0.6672867        TOTS
## 3  0.5972010        TOTS
## 4  0.5244621        TOTS
## 5  0.4346466      Normal
##            Player revision position Int TklW OG PKcon Nation              Squad
## 1 Marcos Llorente   Normal       CM  27   41  0     0    ESP Atl\xe9tico Madrid
## 2    Angel Correa   Normal       RM  19   26  0     0    ARG Atl\xe9tico Madrid
## 3            Koke   Normal       CM  32   46  0     0    ESP Atl\xe9tico Madrid
## 4           Pedri   Normal       LM  30   28  0     0    ESP          Barcelona
## 5 Frenkie de Jong   Normal       CM  29   24  0     1    NED          Barcelona
##   Age Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1  26 1995 32 2529                        28.1  11  10       11  0     0    6
## 2  26 1995 33 2022                        22.5   7   8        7  0     0    3
## 3  29 1992 32 2635                        29.3   1   2        1  0     0    8
## 4  18 2002 32 2130                        23.7   2   3        2  0     0    2
## 5  23 1997 32 2721                        30.2   3   4        3  0     0    4
##   CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0    0.39    0.36           0.75             0.39                    0.75
## 2    0    0.31    0.36           0.67             0.31                    0.67
## 3    0    0.03    0.07           0.10             0.03                    0.10
## 4    0    0.08    0.13           0.21             0.08                    0.21
## 5    0    0.10    0.13           0.23             0.10                    0.23
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  1 60 22 38  73         NA    0.2397710  0.7602290        TOTS
## 2 Matches  1 60 22 38  73         NA    0.4097860  0.5902140        TOTS
## 3 Matches  1 60 22 38  73         NA    0.4153839  0.5846161        TOTS
## 4 Matches  3 76 29 47  71         NA    0.4451463  0.5548537        TOTS
## 5 Matches  3 76 29 47  71         NA    0.4470152  0.5529848        TOTS
##           Player revision position Int TklW OG PKcon Nation              Squad
## 1     Jordi Alba   Normal       LB  36   21  1     0    ESP          Barcelona
## 2   Stefan Savic   Normal       CB  29   25  0     1    MNE Atl\xe9tico Madrid
## 3 Raphael Varane   Normal       CB  35   11  1     0    FRA        Real Madrid
## 4  Mario Hermoso   Normal       CB  28   34  1     0    ESP Atl\xe9tico Madrid
## 5         Felipe   Normal       CB  37   15  1     0    BRA Atl\xe9tico Madrid
##   Age Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1  32 1989 29 2530                        28.1   3   5        3  0     0    8
## 2  30 1991 29 2593                        28.8   1   0        1  0     0   13
## 3  28 1993 29 2580                        28.7   2   0        2  0     0    2
## 4  25 1995 26 2181                        24.2   1   1        1  0     0    5
## 5  31 1989 26 1688                        18.8   0   0        0  0     0    6
##   CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0    0.11    0.18           0.28             0.11                    0.28
## 2    0    0.03    0.00           0.03             0.03                    0.03
## 3    0    0.07    0.00           0.07             0.07                    0.07
## 4    0    0.04    0.04           0.08             0.04                    0.08
## 5    0    0.00    0.00           0.00             0.00                    0.00
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  3 76 29 47  71         NA    0.4553549  0.5446451        TOTS
## 2 Matches  1 60 22 38  73         NA    0.4982691  0.5017309        TOTS
## 3 Matches  2 56 24 32  71         NA    0.5324940  0.4675060      Normal
## 4 Matches  1 60 22 38  73         NA    0.6077316  0.3922684      Normal
## 5 Matches  1 60 22 38  73         NA    0.6163003  0.3836997      Normal



French Ligue 1

ligue1_modeling <- ligue1_fifa17_modeling %>% 
  bind_rows(ligue1_fifa18_modeling, ligue1_fifa19_modeling) %>% 
  mutate(revision = as.factor(revision), Nation = as.factor(Nation))
ligue1_modeling %>% 
  ggplot(aes(x = Gls, fill = revision)) +
  geom_density(alpha = 0.5)

ligue1_modeling %>% 
  ggplot(aes(x = Rk, fill = revision)) +
  geom_density(alpha = 0.5)

ligue1_modeling %>% 
  ggplot(aes(x = minutes_played_divided_by90, fill = revision)) +
  geom_density(alpha = 0.5)

ligue1_modeling_outfield <- ligue1_modeling %>% 
  filter(position != "GK") %>% 
  filter(minutes_played_divided_by90 >= 19) %>% 
  mutate(position = ifelse(position == "RWB", "RB", ifelse(position == "LWB", "LB", position))) %>% 
  select(-Goals_allowed, -GA90, -SoTA, -Saves, -Save_percent, -W, -L, -D, -CS, -CS_percent, -Pkatt_against, -PKA, -PKsv, -Pk_Save_percent, -PKm)
set.seed(494)
ligue1_split <- initial_split(ligue1_modeling_outfield, prop = .75, strata = "revision")
ligue1_training <- training(ligue1_split)
ligue1_testing <- testing(ligue1_split)
ligue1_ranger_recipe <- recipe(revision ~., data = ligue1_training) %>% 
  step_rm(Player, Nation, Squad, Born, minutes_played_divided_by90, G_per90, A_per90, Attendance) %>% 
  step_upsample(revision, over_ratio = .3) %>% 
  step_mutate_at(all_numeric(), fn = ~as.numeric(.))

ligue1_ranger_recipe %>% 
  prep(ligue1_training) %>% 
  juice()
## # A tibble: 466 x 24
##    position   Int  TklW    OG PKcon   Age    MP   Min   Gls   Ast Non_PK_G    PK
##    <fct>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>
##  1 RB          39    45     0     0    26    27  2149     0     3        0     0
##  2 LM          11    26     0     0    23    38  2476     3     5        3     0
##  3 RB          51    31     0     0    22    27  2324     0     1        0     0
##  4 ST          14    16     0     0    26    36  2225    10     4       10     0
##  5 CB          41    36     0     0    32    32  2635     1     0        1     0
##  6 RB          72    74     0     3    29    27  2395     0     1        0     0
##  7 RB          67    31     0     1    26    26  2198     1     1        1     0
##  8 CB          27    26     0     1    25    34  3015     1     0        1     0
##  9 RB          74    63     0     0    23    30  2646     0     4        0     0
## 10 LB          24    23     0     3    22    27  2121     0     4        0     0
## # … with 456 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## #   CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## #   G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## #   Pts <dbl>, revision <fct>
ligue1_ranger <- rand_forest(mtry = tune(), 
              min_n = tune(), 
              trees = 100) %>% 
  set_mode("classification") %>% 
  set_engine("ranger")

ligue1_ranger_wf <- 
  workflow() %>% 
  add_recipe(ligue1_ranger_recipe) %>% 
  add_model(ligue1_ranger) 

ligue1_ranger_wf
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
## 
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
## 
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
## 
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
## 
## Main Arguments:
##   mtry = tune()
##   trees = 100
##   min_n = tune()
## 
## Computational engine: ranger
set.seed(494)
ligue1_cv <- vfold_cv(ligue1_training, v = 5)

ligue1_rf_grid <- grid_regular(min_n(), finalize(mtry(), ligue1_training %>% select(-revision)), levels = 3)

ligue1_ctrl_res <- control_stack_grid()

ligue1_ranger_cv <- ligue1_ranger_wf %>% 
  tune_grid(resamples = ligue1_cv,
           grid = ligue1_rf_grid,
           control = ligue1_ctrl_res)
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
collect_metrics(ligue1_ranger_cv)
## # A tibble: 18 x 8
##     mtry min_n .metric  .estimator  mean     n std_err .config             
##    <int> <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
##  1     1     2 accuracy binary     0.912     5  0.0198 Preprocessor1_Model1
##  2     1     2 roc_auc  binary     0.938     5  0.0238 Preprocessor1_Model1
##  3     1    21 accuracy binary     0.912     5  0.0213 Preprocessor1_Model2
##  4     1    21 roc_auc  binary     0.942     5  0.0187 Preprocessor1_Model2
##  5     1    40 accuracy binary     0.912     5  0.0213 Preprocessor1_Model3
##  6     1    40 roc_auc  binary     0.944     5  0.0176 Preprocessor1_Model3
##  7    16     2 accuracy binary     0.926     5  0.0219 Preprocessor1_Model4
##  8    16     2 roc_auc  binary     0.943     5  0.0233 Preprocessor1_Model4
##  9    16    21 accuracy binary     0.912     5  0.0179 Preprocessor1_Model5
## 10    16    21 roc_auc  binary     0.935     5  0.0259 Preprocessor1_Model5
## 11    16    40 accuracy binary     0.912     5  0.0183 Preprocessor1_Model6
## 12    16    40 roc_auc  binary     0.924     5  0.0299 Preprocessor1_Model6
## 13    31     2 accuracy binary     0.924     5  0.0178 Preprocessor1_Model7
## 14    31     2 roc_auc  binary     0.935     5  0.0259 Preprocessor1_Model7
## 15    31    21 accuracy binary     0.909     5  0.0192 Preprocessor1_Model8
## 16    31    21 roc_auc  binary     0.929     5  0.0277 Preprocessor1_Model8
## 17    31    40 accuracy binary     0.907     5  0.0208 Preprocessor1_Model9
## 18    31    40 roc_auc  binary     0.929     5  0.0263 Preprocessor1_Model9
ligue1_best1 <- ligue1_ranger_cv %>% 
  select_best(metric = "accuracy")

ligue1_ranger_final_wf<- ligue1_ranger_wf %>% 
  finalize_workflow(ligue1_best1)
ligue1_ranger_fit <- ligue1_ranger_final_wf %>% 
  fit(ligue1_training)


ligue1_rf_explain <- 
  explain_tidymodels(
    model = ligue1_ranger_fit,
    data = ligue1_training %>% select(-revision), 
    y = as.numeric(ligue1_training$revision == "TOTS"),
    label = "rf"
  )
## Preparation of a new explainer is initiated
##   -> model label       :  rf 
##   -> data              :  407  rows  31  cols 
##   -> target variable   :  407  values 
##   -> predict function  :  yhat.workflow  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package tidymodels , ver. 0.1.3 , task classification (  default  ) 
##   -> predicted values  :  numerical, min =  0 , mean =  0.147113 , max =  1  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -0.86 , mean =  -0.0291769 , max =  0.53  
##   A new explainer has been created! 
ligue1_rf_var_imp <- 
  model_parts(
    ligue1_rf_explain
    )

plot(ligue1_rf_var_imp)

ligue1_ranger_test <- ligue1_ranger_final_wf %>% 
  last_fit(ligue1_split)

ligue1_ranger_test %>% 
  collect_metrics()
## # A tibble: 2 x 4
##   .metric  .estimator .estimate .config             
##   <chr>    <chr>          <dbl> <chr>               
## 1 accuracy binary         0.873 Preprocessor1_Model1
## 2 roc_auc  binary         0.924 Preprocessor1_Model1
ligue1_preds1 <- ligue1_ranger_test %>% 
  collect_predictions()

ligue1_preds1 %>% 
  conf_mat(revision, .pred_class)
##           Truth
## Prediction Normal TOTS
##     Normal    108    6
##     TOTS       11    9
ligue1_ranger_test <- ligue1_testing %>% 
  bind_cols(predict(ligue1_ranger_fit, new_data = ligue1_testing, type = "prob")) %>% 
  bind_cols(predict(ligue1_ranger_fit, new_data = ligue1_testing)) 
ligue1_ranger_test %>% 
  conf_mat(revision, .pred_class)
##           Truth
## Prediction Normal TOTS
##     Normal    108    5
##     TOTS       11   10
ligue1_ranger_test %>% 
  filter(revision != .pred_class)
##              Player revision position Int TklW OG PKcon Nation      Squad Age
## 1        Lois Diony     TOTS       ST   4   14  0     0    FRA      Dijon  23
## 2    Blaise Matuidi   Normal      CDM  40   42  0     0    FRA  Paris S-G  29
## 3    Djibril Sidibe   Normal       RB  47   52  0     1    FRA     Monaco  24
## 4          Jemerson   Normal       CB  54   51  0     0    BRA     Monaco  23
## 5  Giovani Lo Celso   Normal      CAM  20   59  0     0    ARG  Paris S-G  21
## 6     Dimitri Payet   Normal       LW  18    9  0     0    FRA  Marseille  30
## 7     Alassane Plea   Normal       ST  13    9  0     0    FRA       Nice  24
## 8         Adil Rami     TOTS       CB  33   20  1     0    FRA  Marseille  31
## 9        Dani Alves   Normal       RB  28   52  0     0    BRA  Paris S-G  34
## 10            Jorge   Normal       LB  52   33  0     0    BRA     Monaco  21
## 11   Radamel Falcao     TOTS       ST  13    7  1     0    COL     Monaco  31
## 12    Joao Moutinho   Normal       CM  39   44  0     0    POR     Monaco  30
## 13    Houssem Aouar   Normal       CM  31   36  0     0    FRA       Lyon  20
## 14       Kenny Lala     TOTS       RB  29   43  0     1    FRA Strasbourg  26
## 15    Ferland Mendy     TOTS       LB  25   30  0     1    FRA       Lyon  23
## 16       Zeki Celik   Normal       RB  34   55  0     3    TUR      Lille  21
##    Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY CrdR
## 1  1992 35 2807                        31.2  11   7       11  0     0    2    1
## 2  1987 34 2415                        26.8   4   4        4  0     0    4    0
## 3  1992 29 2321                        25.8   2   5        2  0     0    7    0
## 4  1992 34 3058                        34.0   2   0        2  0     0    8    2
## 5  1996 33 1776                        19.7   4   2        4  0     0    2    0
## 6  1987 31 2347                        26.1   6  13        4  2     3    6    0
## 7  1993 35 3041                        33.8  16   4       15  1     2    7    0
## 8  1985 33 2955                        32.8   1   1        1  0     0    5    0
## 9  1983 25 2065                        22.9   1   4        1  0     0    7    1
## 10 1996 22 1919                        21.3   1   2        1  0     0    8    0
## 11 1986 26 2128                        23.6  18   2       15  3     4    1    0
## 12 1986 33 2802                        31.1   1   4        1  0     0    6    0
## 13 1998 37 3061                        34.0   7   7        7  0     0    2    0
## 14 1991 34 3060                        34.0   5   9        4  1     2    4    0
## 15 1995 30 2531                        28.1   2   1        2  0     0    2    0
## 16 1997 34 2971                        33.0   1   5        1  0     0    5    1
##    G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90 Rk
## 1     0.35    0.22           0.58             0.35                    0.58 16
## 2     0.15    0.15           0.30             0.15                    0.30  2
## 3     0.08    0.19           0.27             0.08                    0.27  1
## 4     0.06    0.00           0.06             0.06                    0.06  1
## 5     0.20    0.10           0.30             0.20                    0.30  1
## 6     0.23    0.50           0.73             0.15                    0.65  4
## 7     0.47    0.12           0.59             0.44                    0.56  8
## 8     0.03    0.03           0.06             0.03                    0.06  4
## 9     0.04    0.17           0.22             0.04                    0.22  1
## 10    0.05    0.09           0.14             0.05                    0.14  2
## 11    0.76    0.08           0.85             0.63                    0.72  2
## 12    0.03    0.13           0.16             0.03                    0.16  2
## 13    0.21    0.21           0.41             0.21                    0.41  3
## 14    0.15    0.26           0.41             0.12                    0.38 11
## 15    0.07    0.04           0.11             0.07                    0.11  3
## 16    0.03    0.15           0.18             0.03                    0.18  2
##     GF GA  GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1   46 58 -12  37      10126        0.720      0.280      Normal
## 2   83 27  56  87      45160        0.360      0.640        TOTS
## 3  107 31  76  95       9586        0.180      0.820        TOTS
## 4  107 31  76  95       9586        0.130      0.870        TOTS
## 5  108 29  79  93      46929        0.375      0.625        TOTS
## 6   80 47  33  77      46040        0.460      0.540        TOTS
## 7   53 52   1  54      22876        0.270      0.730        TOTS
## 8   80 47  33  77      46040        0.740      0.260      Normal
## 9  108 29  79  93      46929        0.170      0.830        TOTS
## 10  85 45  40  80       9243        0.475      0.525        TOTS
## 11  85 45  40  80       9243        0.840      0.160      Normal
## 12  85 45  40  80       9243        0.200      0.800        TOTS
## 13  70 47  23  72      49079        0.290      0.710        TOTS
## 14  58 48  10  49      25216        0.790      0.210      Normal
## 15  70 47  23  72      49079        0.720      0.280      Normal
## 16  68 33  35  75      34079        0.365      0.635        TOTS
ligue1_modeling21 <- fifa21_modeling_ligue1 %>% 
  mutate(revision = as.factor(revision), Nation = as.factor(Nation), Age = as.integer(Age))
ligue1_modeling_outfield21 <- ligue1_modeling21 %>% 
  filter(position != "GK") %>% 
  #filter(position %in% c("ST", "LW", "RW", "CF", "CAM")) %>% 
  filter(minutes_played_divided_by90 >= 14) %>% 
  mutate(position = ifelse(position == "RWB", "RB", ifelse(position == "LWB", "LB", position)))
ligue1_ranger_test21 <- ligue1_modeling_outfield21 %>% 
  bind_cols(predict(ligue1_ranger_fit, new_data = ligue1_modeling_outfield21, type = "prob")) %>% 
  bind_cols(predict(ligue1_ranger_fit, new_data = ligue1_modeling_outfield21)) 
## Warning: Novel levels found in column 'Nation': 'AUT', 'CAN', 'CHI', 'CRC',
## 'ECU', 'PER', 'SCO', 'ZIM'. The levels have been removed, and values have been
## coerced to 'NA'.

## Warning: Novel levels found in column 'Nation': 'AUT', 'CAN', 'CHI', 'CRC',
## 'ECU', 'PER', 'SCO', 'ZIM'. The levels have been removed, and values have been
## coerced to 'NA'.
ligue1_ranger_test21 %>% 
  filter(position %in% c("ST", "RW", "CF", "LW")) %>% 
  arrange(desc(.pred_TOTS)) %>% 
  head(5)
##              Player revision position Int TklW OG PKcon Nation       Squad Age
## 1     Memphis Depay   Normal       CF   9    5  0     0    NED        Lyon  27
## 2    Gaetan Laborde   Normal       ST  14   30  0     0    FRA Montpellier  26
## 3     Kylian Mbappe   Normal       ST   5    4  0     0    FRA   Paris S-G  22
## 4     Kevin Volland   Normal       ST   3   26  0     0    GER      Monaco  28
## 5 Wissam Ben Yedder   Normal       ST   8   13  0     0    FRA      Monaco  30
##   Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY CrdR
## 1 1994 34 2653                        29.5  18   9       10  8     8    4    0
## 2 1994 34 2932                        32.6  13   8       13  0     0    3    0
## 3 1998 29 2214                        24.6  25   7       19  6     6    5    0
## 4 1992 31 2419                        26.9  15   7       15  0     0    3    0
## 5 1990 33 2266                        25.2  18   5        9  9    11    3    0
##   G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0.61    0.31           0.92             0.34                    0.64
## 2    0.40    0.25           0.64             0.40                    0.64
## 3    1.02    0.28           1.30             0.77                    1.06
## 4    0.56    0.26           0.82             0.56                    0.82
## 5    0.71    0.20           0.91             0.36                    0.56
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  4 67 35 32  67        258         0.17       0.83        TOTS
## 2 Matches  8 54 57 -3  47         NA         0.20       0.80        TOTS
## 3 Matches  2 77 26 51  72         NA         0.38       0.62        TOTS
## 4 Matches  3 71 38 33  71         NA         0.38       0.62        TOTS
## 5 Matches  3 71 38 33  71         NA         0.41       0.59        TOTS
ligue1_ranger_test21 %>% 
  filter(position %in% c("CAM", "CM", "CDM", "LM", "RM")) %>% 
  arrange(desc(.pred_TOTS)) %>% 
  head(5)
##                Player revision position Int TklW OG PKcon Nation     Squad Age
## 1      Jonathan Bamba   Normal       LM  27   31  0     0    FRA     Lille  25
## 2 Aurelien Tchouameni   Normal       CM  55   86  0     0    FRA    Monaco  21
## 3       Ander Herrera   Normal       CM  22   32  0     0    ESP Paris S-G  31
## 4         Gael Kakuta   Normal      CAM  21   26  0     1    COD      Lens  29
## 5     Leandro Paredes   Normal       CM  16   25  0     0    ARG Paris S-G  26
##   Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY CrdR
## 1 1996 34 2719                        30.2   6   9        6  0     0    2    0
## 2 2000 32 2703                        30.0   2   4        2  0     0    9    1
## 3 1989 27 1571                        17.5   1   3        1  0     0    3    0
## 4 1991 31 2217                        24.6  11   5        4  7     9    2    0
## 5 1994 20 1288                        14.3   1   2        1  0     0    7    1
##   G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0.20    0.30           0.50             0.20                    0.50
## 2    0.07    0.13           0.20             0.07                    0.20
## 3    0.06    0.17           0.23             0.06                    0.23
## 4    0.45    0.20           0.65             0.16                    0.37
## 5    0.07    0.14           0.21             0.07                    0.21
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  1 57 22 35  73        234        0.390      0.610        TOTS
## 2 Matches  3 71 38 33  71         NA        0.470      0.530        TOTS
## 3 Matches  2 77 26 51  72         NA        0.645      0.355      Normal
## 4 Matches  5 54 46  8  56         NA        0.660      0.340      Normal
## 5 Matches  2 77 26 51  72         NA        0.670      0.330      Normal
ligue1_ranger_test21 %>% 
  filter(position %in% c("LB", "CB", "RB")) %>% 
  arrange(desc(.pred_TOTS)) %>% 
  head(5)
##             Player revision position Int TklW OG PKcon Nation     Squad Age
## 1   Thomas Delaine   Normal       LB  17   15  0     0    FRA      Metz  29
## 2 Leonardo Balerdi   Normal       CB  29   17  0     0    ARG Marseille  22
## 3       Leo Dubois   Normal       RB  22   29  0     1    FRA      Lyon  26
## 4  Senou Coulibaly   Normal       CB  34   20  0     0    MLI     Dijon  26
## 5      Sven Botman   Normal       CB  36   27  0     0    NED     Lille  21
##   Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY CrdR
## 1 1992 22 1600                        17.8   3   1        3  0     0    4    0
## 2 1999 17 1363                        15.1   2   0        2  0     0    5    1
## 3 1994 33 2610                        29.0   2   3        2  0     0    3    0
## 4 1994 19 1664                        18.5   2   0        2  0     0    6    1
## 5 2000 33 2949                        32.8   0   0        0  0     0    2    0
##   G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0.17    0.06           0.23             0.17                    0.23
## 2    0.13    0.00           0.13             0.13                    0.13
## 3    0.07    0.10           0.17             0.07                    0.17
## 4    0.11    0.00           0.11             0.11                    0.11
## 5    0.00    0.00           0.00             0.00                    0.00
##   Matches Rk GF GA  GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches 10 37 41  -4  43        557        0.560      0.440      Normal
## 2 Matches  6 49 42   7  55         NA        0.585      0.415      Normal
## 3 Matches  4 67 35  32  67        258        0.630      0.370      Normal
## 4 Matches 20 23 61 -38  18         NA        0.645      0.355      Normal
## 5 Matches  1 57 22  35  73        234        0.665      0.335      Normal



German Bundesliga

fifa19_modeling_bundesliga2 <- fifa19_modeling_bundesliga %>%
  mutate(Player = paste(Player, '19'))

fifa18_modeling_bundesliga2 <- fifa18_modeling_bundesliga %>%
  mutate(Player = paste(Player, '18'))

fifa17_modeling_bundesliga2 <- fifa17_modeling_bundesliga %>%
  mutate(Player = paste(Player, '17'))



bundesliga_modeling <- fifa17_modeling_bundesliga2%>% 
  bind_rows(fifa18_modeling_bundesliga2, fifa19_modeling_bundesliga2) %>% 
  mutate(revision = as.factor(revision), Nation = as.factor(Nation))
bundesliga_modeling %>% 
  select(where(is.numeric)) %>% 
  pivot_longer(cols = everything(),
               names_to = "variable", 
               values_to = "value") %>% 
  ggplot(aes(x = value)) +
  geom_histogram(bins = 30) +
  facet_wrap(vars(variable), 
             scales = "free")
## Warning: Removed 17990 rows containing non-finite values (stat_bin).

bundesliga_modeling %>% 
  ggplot(aes(x = revision, fill = revision)) +
  geom_bar() +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold"))

bundesliga_modeling %>% 
  ggplot(aes(x = Gls, fill = revision)) +
  geom_density(alpha = 0.5) +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold")) +
  xlab("Goals")

bundesliga_modeling %>% 
  ggplot(aes(x = Rk, fill = revision)) +
  geom_density(alpha = 0.5) +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold")) +
  xlab("Table Position")

bundesliga_modeling %>% 
  ggplot(aes(x = minutes_played_divided_by90, fill = revision)) +
  geom_density(alpha = 0.5) +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold")) +
  xlab("Total Minutes Played Dived by 90 (Full Games Played)")

bundesliga_modeling %>% 
  ggplot(aes(x = position, fill = revision)) +
  geom_bar(position = "dodge") +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold"))

bundesliga_modeling_outfield <- bundesliga_modeling %>% 
  filter(position != "GK") %>% 
  #filter(position %in% c("ST", "LW", "RW", "CF", "CAM")) %>% 
  filter(minutes_played_divided_by90 >= 19) %>% 
  mutate(position = ifelse(position == "RWB", "RB", ifelse(position == "LWB", "LB", ifelse(position == "RW", "RM", ifelse(position == "LW", "LM", position))))) %>% 
  select(-Goals_allowed, -GA90, -SoTA, -Saves, -Save_percent, -W, -L, -D, -CS, -CS_percent, -Pkatt_against, -PKA, -PKsv, -Pk_Save_percent, -PKm)
set.seed(494)
bundesliga_split <- initial_split(bundesliga_modeling_outfield, prop = .75, strata = "revision")
bundesliga_training <- training(bundesliga_split)
bundesliga_testing <- testing(bundesliga_split)
bundesliga_ranger_recipe <- recipe(revision ~., data = bundesliga_training) %>% 
  step_rm(Player, Nation, Squad, G_per90, A_per90, minutes_played_divided_by90, Attendance, Born) %>% 
  step_upsample(revision, over_ratio = .55) %>% 
  step_mutate_at(all_numeric(), fn = ~as.numeric(.))

bundesliga_ranger_recipe %>% 
  prep(bundesliga_training) %>% 
  juice()
## # A tibble: 434 x 24
##    position   Int  TklW    OG PKcon   Age    MP   Min   Gls   Ast Non_PK_G    PK
##    <fct>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>
##  1 CM          37    51     0     1    28    27  2302     1     3        1     0
##  2 LM          29    10     0     0    30    26  1982     7     4        6     1
##  3 ST          11    20     0     0    23    28  1735     5     1        5     0
##  4 CB          34    41     0     0    28    27  2315     3     2        3     0
##  5 CB          74    39     0     0    19    25  2106     0     0        0     0
##  6 CB          10    30     0     0    24    26  2126     3     0        3     0
##  7 LM          27    17     0     0    26    25  1729     1     3        1     0
##  8 CB          16    21     0     1    21    28  2427     0     2        0     0
##  9 CB          35    23     0     1    32    30  2637     0     0        0     0
## 10 CB          21    12     0     1    22    24  1901     1     0        1     0
## # … with 424 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## #   CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## #   G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## #   Pts <dbl>, revision <fct>
bundesliga_ranger <- rand_forest(mtry = tune(), 
              min_n = tune(), 
              trees = 100) %>% 
  set_mode("classification") %>% 
  set_engine("ranger")

bundesliga_ranger_wf <- 
  workflow() %>% 
  add_recipe(bundesliga_ranger_recipe) %>% 
  add_model(bundesliga_ranger) 

bundesliga_ranger_wf
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
## 
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
## 
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
## 
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
## 
## Main Arguments:
##   mtry = tune()
##   trees = 100
##   min_n = tune()
## 
## Computational engine: ranger
set.seed(494)
bundesliga_cv <- vfold_cv(bundesliga_training, v = 5)

bundesliga_rf_grid <- grid_regular(min_n(), finalize(mtry(), bundesliga_training %>% select(-revision)), levels = 3)

ctrl_res <- control_stack_grid()

bundesliga_ranger_cv <- bundesliga_ranger_wf %>% 
  tune_grid(resamples = bundesliga_cv,
           grid = bundesliga_rf_grid,
           control = ctrl_res)
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
collect_metrics(bundesliga_ranger_cv)
## # A tibble: 18 x 8
##     mtry min_n .metric  .estimator  mean     n std_err .config             
##    <int> <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
##  1     1     2 accuracy binary     0.884     5  0.0130 Preprocessor1_Model1
##  2     1     2 roc_auc  binary     0.907     5  0.0111 Preprocessor1_Model1
##  3     1    21 accuracy binary     0.893     5  0.0133 Preprocessor1_Model2
##  4     1    21 roc_auc  binary     0.890     5  0.0160 Preprocessor1_Model2
##  5     1    40 accuracy binary     0.884     5  0.0146 Preprocessor1_Model3
##  6     1    40 roc_auc  binary     0.900     5  0.0105 Preprocessor1_Model3
##  7    16     2 accuracy binary     0.869     5  0.0121 Preprocessor1_Model4
##  8    16     2 roc_auc  binary     0.867     5  0.0202 Preprocessor1_Model4
##  9    16    21 accuracy binary     0.884     5  0.0146 Preprocessor1_Model5
## 10    16    21 roc_auc  binary     0.862     5  0.0172 Preprocessor1_Model5
## 11    16    40 accuracy binary     0.875     5  0.0144 Preprocessor1_Model6
## 12    16    40 roc_auc  binary     0.862     5  0.0169 Preprocessor1_Model6
## 13    31     2 accuracy binary     0.863     5  0.0196 Preprocessor1_Model7
## 14    31     2 roc_auc  binary     0.864     5  0.0240 Preprocessor1_Model7
## 15    31    21 accuracy binary     0.863     5  0.0212 Preprocessor1_Model8
## 16    31    21 roc_auc  binary     0.851     5  0.0140 Preprocessor1_Model8
## 17    31    40 accuracy binary     0.869     5  0.0181 Preprocessor1_Model9
## 18    31    40 roc_auc  binary     0.852     5  0.0175 Preprocessor1_Model9
bundesliga_best1 <- bundesliga_ranger_cv %>% 
  select_best(metric = "accuracy")

bundesliga_ranger_final_wf<- bundesliga_ranger_wf %>% 
  finalize_workflow(bundesliga_best1)
set.seed(494)
bundesliga_ranger_fit <- bundesliga_ranger_final_wf %>% 
  fit(bundesliga_training)


bundesliga_rf_explain <- 
  explain_tidymodels(
    model = bundesliga_ranger_fit,
    data = bundesliga_training %>% select(-revision), 
    y = as.numeric(bundesliga_training$revision == "TOTS"),
    label = "rf"
  )
## Preparation of a new explainer is initiated
##   -> model label       :  rf 
##   -> data              :  328  rows  31  cols 
##   -> target variable   :  328  values 
##   -> predict function  :  yhat.workflow  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package tidymodels , ver. 0.1.3 , task classification (  default  ) 
##   -> predicted values  :  numerical, min =  0.01184855 , mean =  0.235783 , max =  0.9191418  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -0.7846212 , mean =  -0.08944149 , max =  0.714248  
##   A new explainer has been created! 
bundesliga_rf_var_imp <- 
  model_parts(
    bundesliga_rf_explain
    )

plot(bundesliga_rf_var_imp)

bundesliga_ranger_test <- bundesliga_ranger_final_wf %>% 
  last_fit(bundesliga_split)

bundesliga_ranger_test %>% 
  collect_metrics()
## # A tibble: 2 x 4
##   .metric  .estimator .estimate .config             
##   <chr>    <chr>          <dbl> <chr>               
## 1 accuracy binary         0.835 Preprocessor1_Model1
## 2 roc_auc  binary         0.862 Preprocessor1_Model1
preds1 <- bundesliga_ranger_test %>% 
  collect_predictions()

preds1 %>% 
  conf_mat(revision, .pred_class)
##           Truth
## Prediction Normal TOTS
##     Normal     84    9
##     TOTS        9    7
bundesliga_ranger_test <- bundesliga_testing %>% 
  bind_cols(predict(bundesliga_ranger_fit, new_data = bundesliga_testing, type = "prob")) %>% 
  bind_cols(predict(bundesliga_ranger_fit, new_data = bundesliga_testing)) 
bundesliga_ranger_test %>% 
  conf_mat(revision, .pred_class)
##           Truth
## Prediction Normal TOTS
##     Normal     85    9
##     TOTS        8    7
bundesliga_ranger_test %>% 
  filter(revision != .pred_class)
##                    Player revision position Int TklW OG PKcon Nation
## 1       Kerem Demirbay 17   Normal      CAM  44   33  0     0    GER
## 2         Marco Fabian 17     TOTS      CAM  51   31  0     0    MEX
## 3       Vincenzo Grifo 17     TOTS       LM  37   31  0     0    ITA
## 4       Sebastian Rudy 17     TOTS       CM  99   64  0     0    GER
## 5        Javi Martinez 17   Normal       CB  55   37  0     0    ESP
## 6        Julian Brandt 18   Normal       LM  15   16  0     0    GER
## 7  Michael Gregoritsch 18     TOTS      CAM  12   15  0     0    AUT
## 8       Thorgan Hazard 18     TOTS       LM  20   33  0     0    BEL
## 9           Naby Keita 18     TOTS       CM  22   33  0     0    GUI
## 10     Andrej Kramaric 18   Normal       ST   8    3  0     0    CRO
## 11         Philipp Max 18     TOTS       LB  19   31  0     0    GER
## 12       Nils Petersen 18     TOTS       ST  15   16  0     0    GER
## 13             Wendell 18     TOTS       LB  20   25  0     0    BRA
## 14      Ishak Belfodil 19   Normal       ST   2    9  0     0    ALG
## 15        Mats Hummels 19   Normal       CB  27   13  0     1    GER
## 16     Andrej Kramaric 19   Normal       ST   8   11  0     0    CRO
## 17     Lukasz Piszczek 19   Normal       RB  31   30  0     0    POL
##             Squad Age Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G
## 1      Hoffenheim  23 1993 28 2169                        24.1   6   8        6
## 2  Eint Frankfurt  27 1989 24 2054                        22.8   7   4        6
## 3        Freiburg  23 1993 30 2492                        27.7   6   7        5
## 4      Hoffenheim  26 1990 32 2786                        31.0   2   6        2
## 5   Bayern Munich  27 1988 25 2131                        23.7   1   1        1
## 6      Leverkusen  21 1996 34 2326                        25.8   9   3        9
## 7        Augsburg  23 1994 32 2527                        28.1  13   3       12
## 8      M'Gladbach  24 1993 34 2939                        32.7  10   5        5
## 9      RB Leipzig  22 1995 27 1962                        21.8   6   5        6
## 10     Hoffenheim  26 1991 34 2228                        24.8  13   6       11
## 11       Augsburg  23 1993 33 2959                        32.9   2  12        2
## 12       Freiburg  28 1988 32 2244                        24.9  15   1       10
## 13     Leverkusen  24 1993 26 2115                        23.5   2   3        0
## 14     Hoffenheim  26 1992 28 1863                        20.7  16   3       16
## 15  Bayern Munich  29 1988 21 1775                        19.7   1   1        1
## 16     Hoffenheim  27 1991 30 2396                        26.6  17   4       12
## 17       Dortmund  33 1985 20 1756                        19.5   1   6        1
##    PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1   0     0    4    0    0.25    0.33           0.58             0.25
## 2   1     2   10    0    0.31    0.18           0.48             0.26
## 3   1     1    1    0    0.22    0.25           0.47             0.18
## 4   0     0    9    0    0.06    0.19           0.26             0.06
## 5   0     0    5    0    0.04    0.04           0.08             0.04
## 6   0     0    0    0    0.35    0.12           0.46             0.35
## 7   1     1    3    0    0.46    0.11           0.57             0.43
## 8   5     6    1    0    0.31    0.15           0.46             0.15
## 9   0     0    8    2    0.28    0.23           0.50             0.28
## 10  2     2    1    0    0.53    0.24           0.77             0.44
## 11  0     0    5    0    0.06    0.36           0.43             0.06
## 12  5     6    4    1    0.60    0.04           0.64             0.40
## 13  2     3    7    1    0.09    0.13           0.21             0.00
## 14  0     0    3    0    0.77    0.14           0.92             0.77
## 15  0     0    1    0    0.05    0.05           0.10             0.05
## 16  5     6    2    0    0.64    0.15           0.79             0.45
## 17  0     0    3    0    0.05    0.31           0.36             0.05
##    G_plus_A_minus_PK_per90 Rk GF GA  GD Pts Attendance .pred_Normal .pred_TOTS
## 1                     0.58  4 64 37  27  62      28155    0.4478120  0.5521880
## 2                     0.44 11 36 43  -7  42      49165    0.8021510  0.1978490
## 3                     0.43  7 42 60 -18  48      23959    0.8394151  0.1605849
## 4                     0.26  4 64 37  27  62      28155    0.5792912  0.4207088
## 5                     0.08  1 89 22  67  82      75000    0.2754054  0.7245946
## 6                     0.46  5 58 44  14  55      28415    0.4763345  0.5236655
## 7                     0.53 12 43 46  -3  41      28238    0.7306608  0.2693392
## 8                     0.31  9 47 52  -5  47      50986    0.6338276  0.3661724
## 9                     0.50  6 57 53   4  53      39397    0.7875857  0.2124143
## 10                    0.69  3 66 48  18  55      28716    0.2964712  0.7035288
## 11                    0.43 12 43 46  -3  41      28238    0.5823812  0.4176188
## 12                    0.44 15 32 56 -24  36      23894    0.7362705  0.2637295
## 13                    0.13  5 58 44  14  55      28415    0.7559639  0.2440361
## 14                    0.92  9 70 52  18  51      28456    0.3138931  0.6861069
## 15                    0.10  1 88 32  56  78      75000    0.3905305  0.6094695
## 16                    0.60  9 70 52  18  51      28456    0.3151089  0.6848911
## 17                    0.36  2 81 44  37  76      80841    0.4234112  0.5765888
##    .pred_class
## 1         TOTS
## 2       Normal
## 3       Normal
## 4       Normal
## 5         TOTS
## 6         TOTS
## 7       Normal
## 8       Normal
## 9       Normal
## 10        TOTS
## 11      Normal
## 12      Normal
## 13      Normal
## 14        TOTS
## 15        TOTS
## 16        TOTS
## 17        TOTS
bundesliga_modeling21 <- fifa21_modeling_bundesliga %>% 
  mutate(revision = as.factor(revision), Nation = as.factor(Nation), Age = as.integer(Age))
bundesliga_modeling_outfield21 <- bundesliga_modeling21 %>% 
  filter(position != "GK") %>% 
  #filter(position %in% c("ST", "LW", "RW", "CF", "CAM")) %>% 
  filter(minutes_played_divided_by90 >= 18) %>% 
  mutate(position = ifelse(position == "RWB", "RB", ifelse(position == "LWB", "LB", ifelse(position == "RW", "RM", ifelse(position == "LW", "LM", position)))))
bundesliga_ranger_test21 <- bundesliga_modeling_outfield21 %>% 
  bind_cols(predict(bundesliga_ranger_fit, new_data = bundesliga_modeling_outfield21, type = "prob")) %>% 
  bind_cols(predict(bundesliga_ranger_fit, new_data = bundesliga_modeling_outfield21)) 
## Warning: Novel levels found in column 'Nation': 'ANG', 'ARM', 'BEN', 'BFA',
## 'BUL', 'CAN', 'ECU', 'FRO', 'MKD', 'WAL'. The levels have been removed, and
## values have been coerced to 'NA'.

## Warning: Novel levels found in column 'Nation': 'ANG', 'ARM', 'BEN', 'BFA',
## 'BUL', 'CAN', 'ECU', 'FRO', 'MKD', 'WAL'. The levels have been removed, and
## values have been coerced to 'NA'.
bundesliga_ranger_test21 %>% 
  filter(position %in% c("ST", "RW", "CF", "LW")) %>% 
  arrange(desc(.pred_TOTS)) %>% 
  head(5)
##               Player revision position Int TklW OG PKcon Nation          Squad
## 1      Wout Weghorst   Normal       ST   6   11  0     0    NED      Wolfsburg
## 2 Robert Lewandowski   Normal       ST   6   12  0     0    POL  Bayern Munich
## 3     Erling Haaland   Normal       ST   4    6  0     0    NOR       Dortmund
## 4        Andre Silva   Normal       ST   4    4  0     0    POR Eint Frankfurt
## 5     Sasa Kalajdzic   Normal       ST   5    6  0     0    AUT      Stuttgart
##   Age Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1  28 1992 31 2671                        29.7  20   7       18  2     3    3
## 2  32 1988 26 2188                        24.3  36   6       30  6     7    4
## 3  20 2000 26 2227                        24.7  25   5       23  2     4    2
## 4  25 1995 29 2490                        27.7  25   6       19  6     6    1
## 5  23 1997 30 1874                        20.8  14   4       14  0     0    1
##   CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0    0.67    0.24           0.91             0.61                    0.84
## 2    0    1.48    0.25           1.73             1.23                    1.48
## 3    0    1.01    0.20           1.21             0.93                    1.13
## 4    0    0.90    0.22           1.12             0.69                    0.90
## 5    0    0.67    0.19           0.86             0.67                    0.86
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  3 54 32 22  57        610    0.1393530  0.8606470        TOTS
## 2 Matches  1 86 40 46  71         NA    0.2242189  0.7757811        TOTS
## 3 Matches  5 66 42 24  55       1407    0.2291923  0.7708077        TOTS
## 4 Matches  4 62 47 15  56        967    0.2335847  0.7664153        TOTS
## 5 Matches 10 52 51  1  39       1108    0.4769445  0.5230555        TOTS
bundesliga_ranger_test21 %>% 
  filter(position %in% c("CAM", "CM", "CDM", "LM", "RM")) %>% 
  arrange(desc(.pred_TOTS)) %>% 
  head(5)
##            Player revision position Int TklW OG PKcon Nation         Squad Age
## 1   Thomas Muller   Normal      CAM  17   32  0     0    GER Bayern Munich  31
## 2  Joshua Kimmich   Normal      CDM  36   29  0     0    GER Bayern Munich  26
## 3      Leroy Sane   Normal       LM  11   19  0     0    GER Bayern Munich  25
## 4 Marcel Sabitzer   Normal       CM  34   25  0     0    AUT    RB Leipzig  27
## 5   Leon Goretzka   Normal       CM  49   27  0     0    GER Bayern Munich  26
##   Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY CrdR
## 1 1989 29 2453                        27.3  10  17        9  1     1    0    0
## 2 1995 24 1924                        21.4   3  10        3  0     0    4    0
## 3 1996 29 1672                        18.6   4   9        4  0     0    2    0
## 4 1994 24 1756                        19.5   7   2        4  3     3    6    0
## 5 1995 23 1695                        18.8   5   5        5  0     0    2    0
##   G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0.37    0.62           0.99             0.33                    0.95
## 2    0.14    0.47           0.61             0.14                    0.61
## 3    0.22    0.48           0.70             0.22                    0.70
## 4    0.36    0.10           0.46             0.21                    0.31
## 5    0.27    0.27           0.53             0.27                    0.53
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  1 86 40 46  71         NA    0.2481039  0.7518961        TOTS
## 2 Matches  1 86 40 46  71         NA    0.3504694  0.6495306        TOTS
## 3 Matches  1 86 40 46  71         NA    0.3699144  0.6300856        TOTS
## 4 Matches  2 55 25 30  64       1125    0.3781275  0.6218725        TOTS
## 5 Matches  1 86 40 46  71         NA    0.3908449  0.6091551        TOTS
bundesliga_ranger_test21 %>% 
  filter(position %in% c("LB", "CB", "RB")) %>% 
  arrange(desc(.pred_TOTS)) %>% 
  head(5)
##           Player revision position Int TklW OG PKcon Nation         Squad Age
## 1    David Alaba   Normal       CB  34   28  0     0    AUT Bayern Munich  28
## 2 Jerome Boateng   Normal       CB  42   17  0     0    GER Bayern Munich  32
## 3    Willi Orban   Normal       CB  29   22  0     0    HUN    RB Leipzig  28
## 4     Ridle Baku   Normal       RB  43   26  0     0    GER     Wolfsburg  23
## 5       Angelino   Normal       LB  29   16  0     0    ESP    RB Leipzig  24
##   Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY CrdR
## 1 1992 29 2454                        27.3   2   2        2  0     0    3    0
## 2 1988 26 2148                        23.9   1   1        1  0     0    6    0
## 3 1992 26 2093                        23.3   4   1        4  0     0    4    0
## 4 1998 29 2409                        26.8   6   4        6  0     0    0    0
## 5 1997 24 2042                        22.7   4   4        4  0     0    2    0
##   G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0.07    0.07           0.15             0.07                    0.15
## 2    0.04    0.04           0.08             0.04                    0.08
## 3    0.17    0.04           0.22             0.17                    0.22
## 4    0.22    0.15           0.37             0.22                    0.37
## 5    0.18    0.18           0.35             0.18                    0.35
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  1 86 40 46  71         NA    0.4763832  0.5236168        TOTS
## 2 Matches  1 86 40 46  71         NA    0.4878442  0.5121558        TOTS
## 3 Matches  2 55 25 30  64       1125    0.5158685  0.4841315      Normal
## 4 Matches  3 54 32 22  57        610    0.5356797  0.4643203      Normal
## 5 Matches  2 55 25 30  64       1125    0.5560889  0.4439111      Normal



Italian Serie A

fifa19_modeling_serie_a2 <- fifa19_modeling_serie_a %>%
  mutate(Player = paste(Player, '19'))

fifa18_modeling_serie_a2 <- fifa18_modeling_serie_a %>%
  mutate(Player = paste(Player, '18'))

fifa17_modeling_serie_a2 <- fifa17_modeling_serie_a %>%
  mutate(Player = paste(Player, '17'))



serie_a_modeling <- fifa17_modeling_serie_a2%>% 
  bind_rows(fifa18_modeling_serie_a2, fifa19_modeling_serie_a2) %>% 
  mutate(revision = as.factor(revision), Nation = as.factor(Nation))
serie_a_modeling %>% 
  select(where(is.numeric)) %>% 
  pivot_longer(cols = everything(),
               names_to = "variable", 
               values_to = "value") %>% 
  ggplot(aes(x = value)) +
  geom_histogram(bins = 30) +
  facet_wrap(vars(variable), 
             scales = "free")
## Warning: Removed 19578 rows containing non-finite values (stat_bin).

serie_a_modeling %>% 
  ggplot(aes(x = revision, fill = revision)) +
  geom_bar() +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold"))

serie_a_modeling %>% 
  ggplot(aes(x = Gls, fill = revision)) +
  geom_density(alpha = 0.5) +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold")) +
  xlab("Goals")

serie_a_modeling %>% 
  ggplot(aes(x = Rk, fill = revision)) +
  geom_density(alpha = 0.5) +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold")) +
  xlab("Table Position")

serie_a_modeling %>% 
  ggplot(aes(x = minutes_played_divided_by90, fill = revision)) +
  geom_density(alpha = 0.5) +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold")) +
  xlab("Total Minutes Played Dived by 90 (Full Games Played)")

serie_a_modeling %>% 
  ggplot(aes(x = position, fill = revision)) +
  geom_bar(position = "dodge") +
  scale_fill_manual(values = c("TOTS" = "blue", "Normal" = "gold"))

serie_a_modeling_outfield <- serie_a_modeling %>% 
  filter(position != "GK") %>% 
  #filter(position %in% c("ST", "LW", "RW", "CF", "CAM")) %>% 
  filter(minutes_played_divided_by90 >= 19) %>% 
  mutate(position = ifelse(position == "RWB", "RB", ifelse(position == "LWB", "LB", ifelse(position == "RW", "RM", ifelse(position == "LW", "LM", position))))) %>% 
  select(-Goals_allowed, -GA90, -SoTA, -Saves, -Save_percent, -W, -L, -D, -CS, -CS_percent, -Pkatt_against, -PKA, -PKsv, -Pk_Save_percent, -PKm)
set.seed(494)
serie_a_split <- initial_split(serie_a_modeling_outfield, prop = .75, strata = "revision")
serie_a_training <- training(serie_a_split)
serie_a_testing <- testing(serie_a_split)
serie_a_ranger_recipe <- recipe(revision ~., data = serie_a_training) %>% 
  step_rm(Player, Nation, Squad, Born, G_per90, A_per90, minutes_played_divided_by90, Attendance) %>% 
  step_upsample(revision, over_ratio = .4) %>% 
  step_mutate_at(all_numeric(), fn = ~as.numeric(.))

serie_a_ranger_recipe %>% 
  prep(serie_a_training) %>% 
  juice()
## # A tibble: 502 x 24
##    position   Int  TklW    OG PKcon   Age    MP   Min   Gls   Ast Non_PK_G    PK
##    <fct>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>
##  1 CB          51    32     0     2    27    37  3258     2     0        2     0
##  2 RM          33    27     0     0    26    35  2516    12     8       10     2
##  3 LB          15    24     0     0    31    24  1807     3     2        3     0
##  4 CM          40    38     0     0    32    34  2741     0     0        0     0
##  5 CDM         28    23     0     0    21    25  1803     0     1        0     0
##  6 CB          18    20     0     1    22    32  2880     0     1        0     0
##  7 CB          25    26     0     1    31    24  1927     0     0        0     0
##  8 ST           9    15     0     2    28    33  2719    11     0       11     0
##  9 LB          15    24     0     0    31    24  1807     3     2        3     0
## 10 CM          37    52     0     0    32    28  1984     0     4        0     0
## # … with 492 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## #   CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## #   G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## #   Pts <dbl>, revision <fct>
serie_a_ranger <- rand_forest(mtry = tune(), 
              min_n = tune(), 
              trees = 100) %>% 
  set_mode("classification") %>% 
  set_engine("ranger")

serie_a_ranger_wf <- 
  workflow() %>% 
  add_recipe(serie_a_ranger_recipe) %>% 
  add_model(serie_a_ranger) 

serie_a_ranger_wf
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
## 
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
## 
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
## 
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
## 
## Main Arguments:
##   mtry = tune()
##   trees = 100
##   min_n = tune()
## 
## Computational engine: ranger
set.seed(494)
serie_a_cv <- vfold_cv(serie_a_training, v = 5)

serie_a_rf_grid <- grid_regular(min_n(), finalize(mtry(), serie_a_training %>% select(-revision)), levels = 3)

ctrl_res <- control_stack_grid()

serie_a_ranger_cv <- serie_a_ranger_wf %>% 
  tune_grid(resamples = serie_a_cv,
           grid = serie_a_rf_grid,
           control = ctrl_res)
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
collect_metrics(serie_a_ranger_cv)
## # A tibble: 18 x 8
##     mtry min_n .metric  .estimator  mean     n std_err .config             
##    <int> <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
##  1     1     2 accuracy binary     0.908     5 0.0124  Preprocessor1_Model1
##  2     1     2 roc_auc  binary     0.902     5 0.0138  Preprocessor1_Model1
##  3     1    21 accuracy binary     0.896     5 0.0140  Preprocessor1_Model2
##  4     1    21 roc_auc  binary     0.888     5 0.0193  Preprocessor1_Model2
##  5     1    40 accuracy binary     0.901     5 0.0102  Preprocessor1_Model3
##  6     1    40 roc_auc  binary     0.897     5 0.0181  Preprocessor1_Model3
##  7    16     2 accuracy binary     0.881     5 0.0194  Preprocessor1_Model4
##  8    16     2 roc_auc  binary     0.879     5 0.0206  Preprocessor1_Model4
##  9    16    21 accuracy binary     0.869     5 0.0101  Preprocessor1_Model5
## 10    16    21 roc_auc  binary     0.877     5 0.0200  Preprocessor1_Model5
## 11    16    40 accuracy binary     0.869     5 0.0114  Preprocessor1_Model6
## 12    16    40 roc_auc  binary     0.880     5 0.0236  Preprocessor1_Model6
## 13    31     2 accuracy binary     0.867     5 0.0168  Preprocessor1_Model7
## 14    31     2 roc_auc  binary     0.867     5 0.0246  Preprocessor1_Model7
## 15    31    21 accuracy binary     0.857     5 0.00866 Preprocessor1_Model8
## 16    31    21 roc_auc  binary     0.868     5 0.0251  Preprocessor1_Model8
## 17    31    40 accuracy binary     0.859     5 0.0150  Preprocessor1_Model9
## 18    31    40 roc_auc  binary     0.871     5 0.0218  Preprocessor1_Model9
serie_a_best1 <- serie_a_ranger_cv %>% 
  select_best(metric = "accuracy")

serie_a_ranger_final_wf<- serie_a_ranger_wf %>% 
  finalize_workflow(serie_a_best1)
set.seed(494)
serie_a_ranger_fit <- serie_a_ranger_final_wf %>% 
  fit(serie_a_training)


serie_a_rf_explain <- 
  explain_tidymodels(
    model = serie_a_ranger_fit,
    data = serie_a_training %>% select(-revision), 
    y = as.numeric(serie_a_training$revision == "TOTS"),
    label = "rf"
  )
## Preparation of a new explainer is initiated
##   -> model label       :  rf 
##   -> data              :  412  rows  31  cols 
##   -> target variable   :  412  values 
##   -> predict function  :  yhat.workflow  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package tidymodels , ver. 0.1.3 , task classification (  default  ) 
##   -> predicted values  :  numerical, min =  0.001597412 , mean =  0.1840812 , max =  0.9732857  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -0.6253171 , mean =  -0.0554404 , max =  0.7448798  
##   A new explainer has been created! 
serie_a_rf_var_imp <- 
  model_parts(
    serie_a_rf_explain
    )

plot(serie_a_rf_var_imp)

serie_a_ranger_test <- serie_a_ranger_final_wf %>% 
  last_fit(serie_a_split)

serie_a_ranger_test %>% 
  collect_metrics()
## # A tibble: 2 x 4
##   .metric  .estimator .estimate .config             
##   <chr>    <chr>          <dbl> <chr>               
## 1 accuracy binary         0.919 Preprocessor1_Model1
## 2 roc_auc  binary         0.943 Preprocessor1_Model1
preds1 <- serie_a_ranger_test %>% 
  collect_predictions()

preds1 %>% 
  conf_mat(revision, .pred_class)
##           Truth
## Prediction Normal TOTS
##     Normal    117    9
##     TOTS        2    8
serie_a_ranger_test <- serie_a_testing %>% 
  bind_cols(predict(serie_a_ranger_fit, new_data = serie_a_testing, type = "prob")) %>% 
  bind_cols(predict(serie_a_ranger_fit, new_data = serie_a_testing)) 
serie_a_ranger_test %>% 
  conf_mat(revision, .pred_class)
##           Truth
## Prediction Normal TOTS
##     Normal    117    8
##     TOTS        2    9
serie_a_ranger_test %>% 
  filter(revision != .pred_class)
##                   Player revision position Int TklW OG PKcon Nation      Squad
## 1      Mattia Caldara 17     TOTS       CB  90   36  0     0    ITA   Atalanta
## 2   Giorgio Chiellini 18     TOTS       CB  28   15  0     0    ITA   Juventus
## 3     Federico Chiesa 18     TOTS       RM   9   37  0     0    ITA Fiorentina
## 4          Edin Dzeko 18   Normal       ST   2   10  0     0    BIH       Roma
## 5  Fabio Quagliarella 18     TOTS       ST   8    9  0     0    ITA  Sampdoria
## 6            Emre Can 19     TOTS       CM  21   58  1     1    GER   Juventus
## 7   Giorgio Chiellini 19     TOTS       CB  23    9  0     0    ITA   Juventus
## 8     Rodrigo De Paul 19     TOTS       CM  36   31  0     1    ARG    Udinese
## 9     Mario Mandzukic 19   Normal       ST  12   22  0     0    CRO   Juventus
## 10              Allan 19     TOTS       CM  16   92  0     0    BRA     Napoli
##    Age Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1   22 1994 30 2655                        29.5   7   0        7  0     0    4
## 2   32 1984 26 2161                        24.0   0   1        0  0     0    2
## 3   19 1997 36 3012                        33.5   6   4        6  0     0    7
## 4   31 1986 36 3018                        33.5  16   3       16  0     0    6
## 5   34 1983 35 2719                        30.2  19   5       12  7     8    4
## 6   24 1994 29 1811                        20.1   4   1        3  1     1    7
## 7   33 1984 25 1991                        22.1   1   1        1  0     0    3
## 8   24 1994 36 3189                        35.4   9   9        6  3     6    7
## 9   32 1986 25 2014                        22.4   9   6        9  0     0    4
## 10  27 1991 33 2616                        29.1   1   3        1  0     0   10
##    CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1     0    0.24    0.00           0.24             0.24                    0.24
## 2     0    0.00    0.04           0.04             0.00                    0.04
## 3     0    0.18    0.12           0.30             0.18                    0.30
## 4     0    0.48    0.09           0.57             0.48                    0.57
## 5     0    0.63    0.17           0.79             0.40                    0.56
## 6     0    0.20    0.05           0.25             0.15                    0.20
## 7     0    0.05    0.05           0.09             0.05                    0.09
## 8     0    0.25    0.25           0.51             0.17                    0.42
## 9     0    0.40    0.27           0.67             0.40                    0.67
## 10    0    0.03    0.10           0.14             0.03                    0.14
##    Rk GF GA  GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1   4 62 41  21  72      16948    0.5831035  0.4168965      Normal
## 2   1 86 24  62  95      39316    0.8128594  0.1871406      Normal
## 3   8 54 46   8  57      26092    0.6652006  0.3347994      Normal
## 4   3 61 28  33  77      37450    0.4981857  0.5018143        TOTS
## 5  10 56 60  -4  54      20156    0.7249446  0.2750554      Normal
## 6   1 70 30  40  90      37799    0.6721292  0.3278708      Normal
## 7   1 70 30  40  90      37799    0.8219823  0.1780177      Normal
## 8  12 39 53 -14  43      20414    0.8304859  0.1695141      Normal
## 9   1 70 30  40  90      37799    0.4826763  0.5173237        TOTS
## 10  2 74 36  38  79      29003    0.7006194  0.2993806      Normal
serie_a_modeling21 <- fifa21_modeling_serie_a %>% 
  mutate(revision = as.factor(revision), Nation = as.factor(Nation), Age = as.integer(Age))
serie_a_modeling_outfield21 <- serie_a_modeling21 %>% 
  filter(position != "GK") %>% 
  #filter(position %in% c("ST", "LW", "RW", "CF", "CAM")) %>% 
  filter(minutes_played_divided_by90 >= 14) %>% 
  mutate(position = ifelse(position == "RWB", "RB", ifelse(position == "LWB", "LB", ifelse(position == "RW", "RM", ifelse(position == "LW", "LM", position)))))
serie_a_ranger_test21 <- serie_a_modeling_outfield21 %>% 
  bind_cols(predict(serie_a_ranger_fit, new_data = serie_a_modeling_outfield21, type = "prob")) %>% 
  bind_cols(predict(serie_a_ranger_fit, new_data = serie_a_modeling_outfield21)) 
## Warning: Novel levels found in column 'Nation': 'ARM', 'EQG', 'RUS', 'UKR',
## 'USA', 'WAL'. The levels have been removed, and values have been coerced to
## 'NA'.

## Warning: Novel levels found in column 'Nation': 'ARM', 'EQG', 'RUS', 'UKR',
## 'USA', 'WAL'. The levels have been removed, and values have been coerced to
## 'NA'.
serie_a_ranger_test21 %>% 
  filter(position %in% c("ST", "RW", "CF", "LW")) %>% 
  arrange(desc(.pred_TOTS)) %>% 
  head(5)
##              Player revision position Int TklW OG PKcon Nation    Squad Age
## 1     Romelu Lukaku   Normal       ST   2    3  0     0    BEL    Inter  27
## 2 Cristiano Ronaldo   Normal       ST   5    2  0     0    POR Juventus  36
## 3  Lautaro Martinez   Normal       ST  26   15  0     0    ARG    Inter  23
## 4      Duvan Zapata   Normal       ST   9    6  0     0    COL Atalanta  30
## 5     Ciro Immobile   Normal       ST   4    9  0     0    ITA    Lazio  31
##   Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY CrdR
## 1 1993 32 2580                        28.7  21   9       16  5     5    4    0
## 2 1985 29 2463                        27.4  25   2       20  5     6    3    0
## 3 1997 33 2238                        24.9  15   5       15  0     0    3    0
## 4 1991 32 2052                        22.8  14   7       13  1     1    0    0
## 5 1990 30 2399                        26.7  18   5       15  3     6    4    1
##   G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0.73    0.31           1.05             0.56                    0.87
## 2    0.91    0.07           0.99             0.73                    0.80
## 3    0.60    0.20           0.80             0.60                    0.80
## 4    0.61    0.31           0.92             0.57                    0.88
## 5    0.68    0.19           0.86             0.56                    0.75
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  1 72 29 43  79        125    0.1496536  0.8503464        TOTS
## 2 Matches  3 65 30 35  66        118    0.2607361  0.7392639        TOTS
## 3 Matches  1 72 29 43  79        125    0.3807273  0.6192727        TOTS
## 4 Matches  2 78 39 39  68        118    0.3960695  0.6039305        TOTS
## 5 Matches  6 56 46 10  61        188    0.4556741  0.5443259        TOTS
serie_a_ranger_test21 %>% 
  filter(position %in% c("CAM", "CM", "CDM", "LM", "RM")) %>% 
  arrange(desc(.pred_TOTS)) %>% 
  head(5)
##            Player revision position Int TklW OG PKcon Nation    Squad Age Born
## 1 Matteo Politano   Normal       RM  19   11  0     0    ITA   Napoli  27 1993
## 2  Hirving Lozano   Normal       RM  13   18  0     0    MEX   Napoli  25 1995
## 3 Lorenzo Insigne   Normal       LM  34   11  0     0    ITA   Napoli  29 1991
## 4    Robin Gosens   Normal       LM  38   26  1     0    GER Atalanta  26 1994
## 5 Piotr Zielinski   Normal       CM  16   14  0     0    POL   Napoli  26 1994
##   MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY CrdR
## 1 32 1696                        18.8   9   4        9  0     0    3    0
## 2 27 1727                        19.2   9   3        9  0     0    5    0
## 3 30 2415                        26.8  17   6       10  7     7    2    1
## 4 27 2143                        23.8   8   6        8  0     0    9    1
## 5 31 2154                        23.9   6   8        6  0     0    2    0
##   G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0.48    0.21           0.69             0.48                    0.69
## 2    0.47    0.16           0.63             0.47                    0.63
## 3    0.63    0.22           0.86             0.37                    0.60
## 4    0.34    0.25           0.59             0.34                    0.59
## 5    0.25    0.33           0.58             0.25                    0.58
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  4 73 37 36  66        125    0.4478159  0.5521841        TOTS
## 2 Matches  4 73 37 36  66        125    0.4575231  0.5424769        TOTS
## 3 Matches  4 73 37 36  66        125    0.4712953  0.5287047        TOTS
## 4 Matches  2 78 39 39  68        118    0.4928593  0.5071407        TOTS
## 5 Matches  4 73 37 36  66        125    0.4965284  0.5034716        TOTS
serie_a_ranger_test21 %>% 
  filter(position %in% c("LB", "CB", "RB")) %>% 
  arrange(desc(.pred_TOTS)) %>% 
  head(5)
##                Player revision position Int TklW OG PKcon Nation    Squad Age
## 1     Cristian Romero   Normal       CB  70   42  0     0    ARG Atalanta  23
## 2       Juan Cuadrado   Normal       RB  20   21  0     0    COL Juventus  32
## 3      Milan Skriniar   Normal       CB  26   26  0     0    SVK    Inter  26
## 4        Rafael Toloi   Normal       CB  35   28  0     1    ITA Atalanta  30
## 5 Giovanni Di Lorenzo   Normal       RB  29   50  0     1    ITA   Napoli  27
##   Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY CrdR
## 1 1998 26 2095                        23.3   2   2        2  0     0   10    0
## 2 1988 25 1812                        20.1   0  10        0  0     0    8    1
## 3 1995 29 2507                        27.9   3   0        3  0     0    1    0
## 4 1990 28 2283                        25.4   2   0        2  0     0    7    0
## 5 1993 31 2790                        31.0   2   5        2  0     0   11    0
##   G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1    0.09    0.09           0.17             0.09                    0.17
## 2    0.00    0.50           0.50             0.00                    0.50
## 3    0.11    0.00           0.11             0.11                    0.11
## 4    0.08    0.00           0.08             0.08                    0.08
## 5    0.06    0.16           0.23             0.06                    0.23
##   Matches Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 Matches  2 78 39 39  68        118    0.5632139  0.4367861      Normal
## 2 Matches  3 65 30 35  66        118    0.6404613  0.3595387      Normal
## 3 Matches  1 72 29 43  79        125    0.6667806  0.3332194      Normal
## 4 Matches  2 78 39 39  68        118    0.6965193  0.3034807      Normal
## 5 Matches  4 73 37 36  66        125    0.7191976  0.2808024      Normal



All Leagues Combined (and why it does not work)